Omega predicts whether the agent will one-box and two-box, and tells him before he does so.
The agent is The Contradictor, he will do the opposite of whatever Omega will tell him he does.
We can construct more interesting versions of the Impossible Newcomb’s Problem where the agents follow well-specified decision theories, as opposed to just “my decision theory is to do the opposite of what you say I will”. Compare:
The Cosmic Ray Problem
An agent has a choice between taking $1 or taking $100. There is an extraordinarily tiny but nonzero probability that a cosmic ray will spontaneously strike the agent’s brain in such a way that they will be caused to do the opposite of whichever action they would normally do. If they learn that they have been struck by a cosmic ray, then they will also need to visit the emergency room to ensure there’s no lasting brain damage, at a cost of $1000. Furthermore, the agent knows that they take the $100 if and only if they are hit by the cosmic ray.
If we wanted, we could add more exposition to this description, like “There’s a perfectly trustworthy and accurate predictor who has explained this entire situation to the agent.” But this isn’t strictly necessary, since fair decision problems generally assume the agent has an accurate understanding of the situation they’re in, and specifying how they acquired that information doesn’t change anything in this case.
The cosmic ray problem is inconsistent if the agent is running CDT or FDT, because the problem statement says “CDT and FDT take the $1 unless forced to do otherwise (by a cosmic ray),” but CDT and FDT’s decision algorithms demand that the agent not leave free money on the table: it’s impossible to run a CDT or FDT algorithm that willingly grabs the $1 instead of the $100 when doing so doesn’t increase their causal/functional expected utility. Another way of saying this is “you can’t put CDT or FDT agents into situations that make the above problem description true.”
In contrast, EDT agents can be put into dilemmas like the Cosmic Ray Problem, because if an EDT agent starts off believing the above problem description, they will indeed leave the $100 on the table out of concern for bad news of cosmic rays, which is what the last sentence of the description claims.
Equivalently, EDT can be pushed around by predictors who convince the agent of things that turn out to be self-fulfilling prophecies, whereas the same isn’t true for CDT or FDT, since those decision algorithms aren’t “at the mercy of their priors” in the way EDT is.
One way of generalizing this point is that the purpose of running the right decision theory is actually to make it the case that problem specifications where you leave money on the table are internally inconsistent; and a decision theory is successful to the extent that it achieves this goal, thereby making it the case that as often as possible, the dilemmas you actually end up in are ones where you both can get a lot of utility, and do get a lot of utility. See “Decisions Are For Making Bad Outcomes Inconsistent” for a longer discussion of this.
The boxes aren’t filled by Omega. Omega comes and tell you: “If you one-box, you’ll get a million. If you two-box, you’ll only get a thousand.”
One of the lessons of the discussion of the smoking lesion problem in the FDT paper, and of the murder lesion problem in Cheating Death in Damascus, is that FDT (like CDT) reasons with something like a causal graph of the decision problem it’s in, and if multiple graphs are consistent with the problem description, then FDT’s behavior may be underdetermined by the problem description. In this case, I can think of at least three causal stories you might tell for how Omega’s claim ends up being true, along with some potential resolutions to ambiguities in what Omega’s saying:
Although Omega itself didn’t fill the boxes, Omega observed some other process (we might call it “Omega+1”) that in some fashion computes the agent’s decision function in advance, and fills the boxes based on this function’s output.
The boxes are connected to an explosive device that’s set up to detect if the agent touches one of the boxes, and to destroy the contents of the other box as soon as the first touch is detected.
Before deciding whether to approach the agent, Omega first observed whether the opaque box had $1,000,000 or not. If the opaque box happened to be empty, then Omega approached the agent and told them, “If you one-box, you’ll get a million. If you two-box, you’ll only get a thousand,” knowing that the agent would recognize that it’s purely a coincidence which box is full and that Omega must therefore just be joking around and trying to find roundabout ways to communicate “I checked, and the opaque box is empty.” The two sentences are vacuously true in the manner of material implications, because the antecedent of the first is known to be false and the consequent of the second is known to be true. Having received the information that the opaque box is already empty and would have been empty regardless of what decision it made, the agent can confidently take the $1000, per the second sentence.
If the agent knows they’re in the first scenario, then they’ll reason the same as in Newcomb’s problem, which this is a trivial variant on: FDT and EDT one-box, and CDT two-boxes. If the agent knows they’re in the second scenario, then all three agents one-box. If the agent knows they’re in the third scenario, then all three agents two-box (i.e., take the $1000).
If the agent is uncertain about which of those scenarios they’re in, then their decision will be based on what probability they assign to being in this or that scenario.
If we have a rate of successful past predictions instead of a perfect prediction assumption, doesn’t that abrogate the fixed-point problem? Or at least constrain it to the space of failed predictions?
We can construct more interesting versions of the Impossible Newcomb’s Problem where the agents follow well-specified decision theories, as opposed to just “my decision theory is to do the opposite of what you say I will”. Compare:
If we wanted, we could add more exposition to this description, like “There’s a perfectly trustworthy and accurate predictor who has explained this entire situation to the agent.” But this isn’t strictly necessary, since fair decision problems generally assume the agent has an accurate understanding of the situation they’re in, and specifying how they acquired that information doesn’t change anything in this case.
The cosmic ray problem is inconsistent if the agent is running CDT or FDT, because the problem statement says “CDT and FDT take the $1 unless forced to do otherwise (by a cosmic ray),” but CDT and FDT’s decision algorithms demand that the agent not leave free money on the table: it’s impossible to run a CDT or FDT algorithm that willingly grabs the $1 instead of the $100 when doing so doesn’t increase their causal/functional expected utility. Another way of saying this is “you can’t put CDT or FDT agents into situations that make the above problem description true.”
In contrast, EDT agents can be put into dilemmas like the Cosmic Ray Problem, because if an EDT agent starts off believing the above problem description, they will indeed leave the $100 on the table out of concern for bad news of cosmic rays, which is what the last sentence of the description claims.
Equivalently, EDT can be pushed around by predictors who convince the agent of things that turn out to be self-fulfilling prophecies, whereas the same isn’t true for CDT or FDT, since those decision algorithms aren’t “at the mercy of their priors” in the way EDT is.
One way of generalizing this point is that the purpose of running the right decision theory is actually to make it the case that problem specifications where you leave money on the table are internally inconsistent; and a decision theory is successful to the extent that it achieves this goal, thereby making it the case that as often as possible, the dilemmas you actually end up in are ones where you both can get a lot of utility, and do get a lot of utility. See “Decisions Are For Making Bad Outcomes Inconsistent” for a longer discussion of this.
XOR Newcomb’s Problem
The boxes aren’t filled by Omega. Omega comes and tell you: “If you one-box, you’ll get a million. If you two-box, you’ll only get a thousand.”
Although Omega itself didn’t fill the boxes, Omega observed some other process (we might call it “Omega+1”) that in some fashion computes the agent’s decision function in advance, and fills the boxes based on this function’s output.
The boxes are connected to an explosive device that’s set up to detect if the agent touches one of the boxes, and to destroy the contents of the other box as soon as the first touch is detected.
Before deciding whether to approach the agent, Omega first observed whether the opaque box had $1,000,000 or not. If the opaque box happened to be empty, then Omega approached the agent and told them, “If you one-box, you’ll get a million. If you two-box, you’ll only get a thousand,” knowing that the agent would recognize that it’s purely a coincidence which box is full and that Omega must therefore just be joking around and trying to find roundabout ways to communicate “I checked, and the opaque box is empty.” The two sentences are vacuously true in the manner of material implications, because the antecedent of the first is known to be false and the consequent of the second is known to be true. Having received the information that the opaque box is already empty and would have been empty regardless of what decision it made, the agent can confidently take the $1000, per the second sentence.
If the agent knows they’re in the first scenario, then they’ll reason the same as in Newcomb’s problem, which this is a trivial variant on: FDT and EDT one-box, and CDT two-boxes. If the agent knows they’re in the second scenario, then all three agents one-box. If the agent knows they’re in the third scenario, then all three agents two-box (i.e., take the $1000).
If the agent is uncertain about which of those scenarios they’re in, then their decision will be based on what probability they assign to being in this or that scenario.
If we have a rate of successful past predictions instead of a perfect prediction assumption, doesn’t that abrogate the fixed-point problem? Or at least constrain it to the space of failed predictions?